The field of high-performance computing currently faces dual challenges: important technical problems that require a skilled workforce and the need to recruit more computational researchers, especially those from underrepresented communities. This conversation with Lois Curfman McInnes of Argonne National Laboratory examines both the complexity in building scientific software and the work needed to build the HPC workforce of the future.
You’ll meet:
- Lois Curfman McInnes is a senior computational scientist in the mathematics and computer science division at Argonne National Laboratory. She served as deputy director for the software technology focus are of the U.S. Department of Energy’s Exascale Computing Project and completed her Ph.D. in applied mathematics at the University of Virginia.
From the episode:
Lois discussed the Message Passing Interface and her early work on the PETSc library at Argonne National Laboratory with Bill Gropp, Barry Smith, Satish Balay and many others in the PETSc developer and user community.
In our discussion about software sustainability, Lois talked about the United Kingdom’s Software Sustainability Institute, founded in 2010 as one of the world’s first organizations focused on improving research software.
Lois was recently deputy director for software technology for the Exascale Computing Project (ECP). Now that ECP has ended, software stewardship organizations supported by the DOE’s Office of Advanced Scientific Computing Research have organized the Consortium for the Advancement of Scientific Software, known as CASS. Lois and collaborators in the PESO project within CASS are working to steward a sustainable scientific software ecosystem comprising libraries and tools that deliver the latest high-performance algorithms and capabilities for DOE mission-critical applications and beyond.
As we discussed challenges in expanding the HPC workforce, Lois mentioned the Broadening Participation Initiative that was launched by ECP in 2021. That effort supported an Introduction to HPC bootcamp held at NERSC in summer 2023, and Lois mentioned the work of Jini Ramprakash and Paige Kinsley of Argonne and Mary Ann Leung of the Sustainable Horizons Institute. (Mary Ann was a recipient of a DOE Computational Science Graduate Fellowship from 2001 to 2005.) Similar training opportunities have been available through Computing4Change and the Advanced Computing for Social Change Institute at the Texas Advanced Computing Center.
Mary Ann Leung also had a leading role in the Sustainable Research Pathways Internship and Workforce Development Program with Berkeley Lab’s David Brown and Silvia Crivelli. That program started at Berkeley Lab in 2015 before expanding to 10 labs with support from ECP in 2021. In 2024, that program continues with support from individual research projects and from the DOE’s Science Undergraduate Laboratory Internship program and Visiting Faculty Program.
Lois presented an invited talk at the SC23 meeting: Broadening Participation in HPC: Together We Can Make a Difference.
Additional Reading:
- Basic Research Needs in The Science of Scientific Software Development and Use: Investment in Software is Investment in Science
- How Community Software Ecosystems Can Unlock the Potential of Exascale Computing
- Building a Diverse and Inclusive HPC Community for Mission-Driven Team Science
- A cast of thousands: How the IDEAS Productivity project has advanced software productivity and sustainability
- Intro to HPC Bootcamp: Engaging New Communities Through Energy Justice Projects
Related Episodes:
Lois mentioned ECP’s work on teams of teams with Elaine Raybourn of Sandia National Laboratories. Elaine was a guest for our Future of Work series (part 1 and part 2) in season 2 and was mentioned when Margaret Lawson discussed HPC and ethics in season 3.
For more on diversity, equity and inclusion in HPC, check out our season 2 episode with Valerie Taylor of Argonne National Laboratory.
Transcript
Sarah Webb 00:00
I’m your host, Sarah Webb. And we’re back for season five of Science in Parallel, a podcast about people and projects in computational science. For the rest of 2024 we’ll be continuing our series on creativity in computing. Stay tuned for monthly episodes leading up to the SC24 meeting in Atlanta in November, where the theme will be HPC Creates. That’s not all: Science in Parallel has a new website. Check out new episodes, past favorites and the extra details in our show notes at scienceinparallel.org.
00:35-00:56 [Theme music plays.]
Sarah Webb 00:57
In this episode, you’ll hear from Lois Curfman McInnes, a senior computational scientist at Argonne National Laboratory, and who served as Deputy Director for software technology during the Department of Energy’s Exascale Computing Project, or ECP. Lois and I initially spoke last November at the SC23 meeting in Denver about her career in scientific software at Argonne, and her keynote talk about building a more diverse workforce in high-performance computing, or HPC. Both areas are creative and collaborative challenges. The Exascale Computing Project ended in 2023 but work to build software sustainability partnerships and train an inclusive workforce continues. Stay tuned to hear more.
Sarah Webb 01:43
Welcome, Lois. It is great to have you on the podcast.
01:46
Thank you so much. I’m grateful for the opportunity to be here.
Sarah Webb 01:49
So, Lois, you’ve been working on HPC and software for a long time. Let’s start with your core interest in software. As a mathematician, as a computer scientist, there’s lots that you can do. Why software?
Lois Curfman McInnes 02:03
I guess I could go back to my earliest start. I majored in math and physics as an undergraduate at Muhlenberg College, a small liberal arts school in Pennsylvania. And then I went to graduate school in applied math at the University of Virginia. While in graduate school, I became involved in scientific computing, working on numerical methods for the iterative solution of compressible flow problems. And as part of that, I wrote software, and it helped me to do my work, my analysis of linear and nonlinear solvers. But it was, you know, typical graduate student’s software. So, after graduate school, I had the good fortune of moving on to have a postdoc at Argonne National Lab. When I arrived there happened to be in 1993. It was at the time when the Message Passing Interface standard was being finalized.
Sarah Webb 03:01
The Message Passing Interface, also referred to as MPI, was an important software innovation for coordinating and distributing the computational workload across multiple processors. And it remains important in today’s exascale systems.
03:16
And I had the good fortune to land across the hall literally from Bill Gropp, who was one of the leaders of that community in developing the interface and implementing it in freely available software: MPICH. So I also had the chance to collaborate with Bill and another brilliant guy, Barry Smith, who already had started working on publicly available software for numerical analysis and numerical scalable numerical solvers called the PETSc library (or Portable, Extensible Toolkit for Scientific computing). So my background in applied math and nonlinear solvers was really quite complementary to what Bill and Barry already had started. So I joined forces with them and focused on developing nonlinear solvers as part of the PETSc library. At the time we, we transitioned the initial implementations that had been prototypes pre-MPI, to devise a new version that was totally using the MPI standard. Learning from Bill and Barry, who are just amazingly creative experts, we made a lot of wonderful headway, also working with many others in the community.
Lois Curfman McInnes 04:26
Shortly after beginning to work with them, we were able to engage Satish Balay, who’s just an incredibly wonderful research software engineer who has been a part of our team for decades and also to grow our collaborations with many others in the community where our software in the PETSc library has focused on enabling research in numerical algorithms as motivated by challenges in in scientific domains across many different domains, physics, chemistry, all sorts. So we’ve recognized that there’s no one specific approach that works for everyone. But it’s quite important to focus on good software design so that individuals and teams can readily expand, adapt and experiment in order to figure out what approaches for numerical methods are most effective, given particular motivating applications, computing architectures and priorities and challenges. So that’s been a great foundation for me.
Lois Curfman McInnes 05:24
And in more recent years, I’ve had the opportunity to collaborate with many, many other people throughout both the national and international communities who also develop wonderful software. And so that has led me and many others in the community to focus on the interrelationships among the software that many of us develop and how can we as a community do more effective approaches to enable us to collaborate together so that we can address emerging challenges in science, which, frankly, require our collaboration. Software is the way in which we encapsulate our expertise, be that in numerical methods, my particular area, or chemistry, or data analysis, or visualization software; it is the way we encapsulate what we know and what we have investigated so that other people can use it and benefit from it in their simulations. And we absolutely need those collaborations through software, because there’s no one single person or team who can do all of it. It’s just too broad, too difficult.
Sarah Webb 06:28
Computers have changed a lot since you got started on this. And I’m wondering how that has shaped how you’ve thought about software? Can you talk about maybe how it’s evolved over time and over these, you know, over these years that you’ve been working in this space?
Lois Curfman McInnes 06:45
Certainly complexity continues to grow. As we’re empowered by continually advancing computing resources, the new computing resources enable all of us to tackle so many more complex problems, bringing together multiscale multiphysics, modeling and simulation. Now, we’re working increasingly toward coupling that with data from experiments and from observational sites, and even more building across analysis, incorporating learning and emerging AI technologies. So as the computing architectures have grown, we certainly as a community have been building up additional functionality at the lowest layers of the software stack in programming models and runtimes to deal with that, and then toward the intermediate layers of the software stack building on top of those lower-level layers, and continually advancing.
Lois Curfman McInnes 07:42
So we are certainly in an era now where working on achieving good performance on heterogeneous architectures at all scales of computing is very challenging, much harder than it was 20 years ago. At those earlier days, we were focused on a CPU only style of writing code, which is fairly straightforward even in parallel. I can say that in retrospect. Back in 1993, it didn’t seem so straightforward. But as we look back now, and we think about the complexities needed to tackle heterogeneous architectures, which increasingly often incorporate GPUs, as well as CPUs and other types of architectures, we as a community need to figure out how to do that, how to do it effectively, how to do it in a portable way, and how to do it in a way that enables many people across the spectrum to fully leverage these new architectures.
Lois Curfman McInnes 08:41
Right now, I’d say we have a long way to go to fully get there, because right now, it’s very complicated. In order to fully leverage the power even on individual desktops there are architectures that are heterogeneous, requiring new approaches to fully leverage that heterogeneity. That’s very complex. So we, as a community, need further research on algorithms and approaches to software and the numerical analysis to enable us to fully exploit the power of these computers, where we really have challenges as a community where we have to figure out how to fully leverage these architectures and empower people who may want to work at higher levels of abstraction, not down with the nitty-gritty, lower-level details.
Sarah Webb 09:28
So that leads me to go back and ask you something about the meaning of software for this larger HPC ecosystem. There’s a heading in one of your papers that talked about software as the first citizen. And I want you to talk a little bit about what software means to the HPC community and maybe the ways in which software is a bit underappreciated.
Lois Curfman McInnes 09:47
Many people in the community– in recent years across the international community– have been working very hard to advance understanding of the critical role that software plays in research in science and technology overall. As we know in academia, and even in research lab environments, traditional metrics for career advancement focus on writing papers in high quality journals and conferences. Conferences, like Supercomputing, which is wonderful. And that’s certainly important part for us as international communities to communicate about our work. However, equally important is to recognize and value and incorporate into metrics for career advancement the essential work on designing, building, sustaining and evolving software because honestly without software we would not have an HPC community or this wonderful Supercomputing conference.
Lois Curfman McInnes 10:47
Software is the way we do our work, is the way we do our research, the way we advance scientific and technology discoveries, and we as a community absolutely need to acknowledge that and to devise ways to adequately build in funding to support the work that’s needed to do a wonderful job in designing and developing and sustaining software. Frankly, I think many people have the desire to write good quality software. But oftentimes, there’s just not adequate time because the incentive structures both in terms of funding and in terms of career metrics for progression don’t build in adequate funding or time for people to focus enough on advancing software practices. So that’s something that many, many people in the international community are trying to change. I’d like especially to draw attention to the growing movement of research software engineers started in the United Kingdom by the Software Sustainability Institute a number of years ago. That movement has been growing across many different countries.
Sarah Webb 11:55
The Exascale Computing Project brought together large teams across the DOE labs to work on an array of challenges, including a sustainable software ecosystem. That work was a fundamental part of the years long effort to build the new Frontier and Aurora supercomputers and ensure that they can fulfill their mission and important science applications. Now that ECP has ended, software stewardship organizations came together in 2024 to form the Consortium for the Advancement of Scientific Software known as CASS, we have a link to their website in our show notes.
Sarah Webb 12:30
What I find so interesting about HPC is teamwork. And so, I want you to talk a little bit about the human aspect of collaboration.
Lois Curfman McInnes 12:40
The human aspect of collaboration is the foundation of all that we do. We are not just software packages. We are people who create innovative solutions that we often encapsulate in software. So the human element is some aspect of our work that I think increasingly people are recognizing we need to truly lean into and pay attention to. One of the great privileges of my work in the last few years has been to collaborate with the team called the IDEAS Productivity Project, which includes collaborators across the DOE lab system and some universities, where we’ve been looking at how our community can be more productive, help our developers be more productive, and our resulting software be more sustainable. And one of the themes explored by some of the members of that team, in particular Elaine Raybourn. a Sandia scientist who focuses on social issues in computing, is looking at the concept of not just teams, but collaborating teams of teams and how we as a community can approach that and be more effective.
Sarah Webb 12:41
Elaine is a past guest on the podcast. I spoke with her for our season-two series on remote work. And Margaret Lawson mentioned Elaine in season three when we discussed HPC and ethics. Check out our website if you’d like to learn more.
[Brief music transition.]
Sarah Webb 14:13
Then I asked Lois about how that focus on collaboration ties into software.
Lois Curfman McInnes 14:19
We believe that diversity of our teams is incredibly important. And I would say our community overall has done a pretty good job at incorporating technical diversity. We talked about that earlier in the interview. Where I believe we still have plenty of opportunity for improvement is improving our mix of people in terms of their backgrounds on many other fronts besides just the technology front. We have opportunities to reach out to underrepresented groups in order to present the opportunities in computing as ways to address topics of importance to them and to provide pathways for engaging new communities in ways that are appealing to them. Traditionally, I believe many of us in the math and computing communities have chosen to focus on training and education from a point of view of the basics that we use in doing our work. I, myself, and many collaborators have done really very good technical training on topics such as numerical linear algebra in parallel, or the approaches for computing science technologies, really focusing on teaching from the technology roots level.
Lois Curfman McInnes 15:40
What we haven’t done so much yet is focus on the why. Why should new communities care about that? Why should they get involved? And there’s been some really innovative work throughout a number of groups in the community to begin addressing this. The team that I’m working with in the Exascale Computing Project launched the Broadening Participation Initiative in the fall of 2021, where we’ve been trying very intentionally to take a multifaceted approach at engaging new communities and much of that focuses on leaning into the why of using high-performance computing to address challenges in our society– problems with social impact. So, in August of 2023, a multi-institutional team consisting of staff at Argonne, Oak Ridge and Lawrence Berkeley National Labs at the computing centers there and a few collaborators at universities got together to host a bootcamp that focused on introducing early career students to high-performance computing through a lens of projects in energy justice.
Sarah Webb 16:51
Jini Ramprakash, deputy director of the Argonne Leadership Computing Facility, and Paige Kinsley, that facility’s education and outreach lead, led the effort in partnership with Mary Ann Leung of the Sustainable Horizons Institute.
Lois Curfman McInnes 17:13
And this session brought in a wide range of students from 22 states, most of them were undergraduates, many from minority-serving institutions, and engaged them in a multifaceted program, where they were introduced to the excitement of team-based science, especially in high-performance computing. They were introduced to team-based approaches to working collaboratively together across disciplines to address real problems involving energy. That is, of course, the mission of the Department of Energy, our space. We have received very positive feedback from the students, and many of them in a post-event survey indicate strong interest in further pursuing a career in high-performance computing, and also strong interest in potentially pursuing a career at a Department of Energy national lab.
Lois Curfman McInnes 17:13
And we found that this program opened up awareness of many of these students to the existence of these fields and opportunities. We as a high-performance computing community overall, are known within scientific circles. But most people, including I would say, myself when I was a kid growing up in Pennsylvania, had no exposure to scientific computing or high-performance computing. So finding accessible ways to introduce people across our wide world to the power and the wonder of computing is, I believe, incredibly important. So we’re looking at now trying to identify funding and strategies to build on our early successes with the intro HPC bootcamp in order to move forward.
Sarah Webb 18:56
So what other pieces do you see as important in building this broader community? And both preparing people for these careers and finding the people who would be interested in really pursuing these in HPC?
Lois Curfman McInnes 19:12
It’s really a complicated question, isn’t it? Because there’s so many layers and levels where there are opportunities to raise awareness and offer participation, going back to, you know, education at the earliest levels in K through 12, moving through university opportunities and graduate opportunities. So we need all things in many levels. And I would encourage each of us to choose what part of interventions are accessible and interesting to us and devote some of our volunteer time to engage, from getting involved in science and career opportunities targeted at elementary, middle school, high school students, through getting involved in mentoring and deliberately engaging people from all sorts of backgrounds in our work and in our courses at colleges and universities, and in research labs and industry. So a second avenue of work is the Sustainable Research Pathways internship and workforce development program. This is a program initially conceived of and led by Mary Ann Leung of Sustainable Horizons Institute with David Brown and Silvia Crivelli at Lawrence Berkeley National Lab in 2015.
Sarah Webb 20:25
As part of the Exascale Computing Project’s Broadening Participation Initiative, ECP partnered to scale the program across the DOE lab community in 2022 and 2023.
Lois Curfman McInnes 20:37
We’ve succeeded in having a variety of students from underrepresented groups and also faculty working with students in some cases come to the various DOE labs. Ten of them have participated over the last two years for summer internships and also for mentoring and engagement across the multi-lab community.
Sarah Webb 21:00
Although the Exascale Computing Project has now ended, the Sustainable Research Pathways Program continues. The DOE’s Computing Research Leadership Council got involved in 2023. And in 2024, seven DOE labs are hosting the summer internships, supporting students and faculty through research project funding and through the DOE’s Summer Undergraduate Laboratory Internships, and the Visiting Faculty Program. Lois and her colleagues would like to raise more funds so that they can continue the summer boot camp in the future, and support research opportunities for students and faculty.
Sarah Webb 21:39
As we look toward the future, obviously, computing is changing a lot with AI, with the amount of experimental data analysis, in addition to modeling and simulation, it seems like heterogeneous architectures, all of those things. So there are all the software challenges you’ve talked about. You’ve talked about workforce. How do you see these things coming together for you as an individual working in the space and in the broader projects that you work on?
Lois Curfman McInnes 22:05
I believe that your question is one of the most important challenges of our time for our entire community to think about. We have really a necessary mission to stop and think again about the standard practices that our community has been using for decades to engage people in our work and how we can do a better job at looking at alternative ways to engage diverse people. But we’re not going to do that if we keep using the same approaches that we’ve used for the last 40 years, which, frankly, are gatekeeping approaches that keep out many people. What I have noticed is that many people who already have a pathway into our community start off in very privileged environments where they have good education from the earliest levels in the K through 12 and solid education at the undergraduate level, and then they move on. That’s wonderful for that set of people. But there are so many talented and creative people with incredible potential who also could do this if we could find and create and deliberately plan for alternative entry points, alternative pathways that really don’t circumvent our current approaches where people are left out just due to the traditional gating factors.
Sarah Webb 23:28
We’ve been talking about creativity and how important creativity is for this work. So I’m going to ask you what creativity means to you.
Lois Curfman McInnes 23:41
Oh, that’s an interesting question. Creativity in context of the computing sciences, to me, means bringing together a variety of diverse people to identify the most challenging problems to tackle, and creative approaches to addressing them. I really love the natural, collaborative environment and spirit of the computing sciences. That’s one thing that has attracted and drawn me to this over and over again throughout my career. And I’ve benefited from the privilege of working with so many creative people. But as we move forward and we’re seeing even larger challenges for our community, for our world, our society, [music fades in] I believe we absolutely need to figure out how to engage our entire population, in devising creative solutions to the most challenging problems in our world.
Sarah Webb 24:46
Thank you, Lois. I’ve really enjoyed talking with you. Thank you for your time.
Lois Curfman McInnes 24:49
Thanks so much. I appreciate the chance to be here.
Sarah Webb 24:52
[Theme music in background.] To learn more about Lois, the Center for the Advancement of Scientific Software, the Sustainable Research Pathways program and more, check out this episode’s show notes on our new website at scienceinparallel.org. Science in Parallel is produced by the Krell Institute and is a media project of the Department of Energy Computational Science Graduate Fellowship program. Any opinions expressed are those of the speaker and not those of their employers, the Krell Institute or the U.S. Department of Energy. Our music is by Steve O’Reilly. This episode was written, produced and edited by me: Sarah Webb.